Modeling of L2 Cache Behavior for Thread-Parallel Scientific Programs on Chip Multi-Processors
نویسندگان
چکیده
It is critical to provide high performance for scientific programs running on a Chip MultiProcessor (CMP). A CMP architecture often has a shared L2 cache and lower storage hierarchy. The shared L2 cache can reduce the number of cache misses if the data are commonly shared by several threads, but it can also lead to performance degradation due to resource contention. Sometimes running threads on all cores can cause severe contention and increase the number of cache misses greatly. To investigate how a thread’s performance varies when it runs together with other threads on different cores, we develop an analytical model to predict the number of misses on the shared L2 cache, especially for thread-parallel numerical codes. We assume that the parallel threads work on homogeneous tasks and share a fully associative L2 cache. Stack processing technique and circular sequences are used to analyze the L2 trace to predict the number of compulsory misses, capacity misses on shared data, and capacity misses on private data, respectively. It is the first work to predict the number of L2 misses for threads that ∗This material is based upon work supported by the National Science Foundation under grant No. 0444363. have the nature of memory sharing. The model has been validated by three typical scientific programs: matrix multiplication, blocked matrix multiplication, and sparse matrix-vector product on a variety of matrix sizes. The average relative error lies between 2% and 12%.
منابع مشابه
Impact of the Interconnect on Performance and Area/Power for High Core Count (> 8) CMPs
Introduction Faithful to Moore’s law, silicon processing improvements have continually increased the number of transistors available for implementing CPUs within a fixed die area. The designer is left with the choice of how to put those transistors to use. Superscalar processors are organized into parallel pipelines which aggressively seek to execute instructions within a single thread in paral...
متن کاملPerformance of Multithreaded Chip Multiprocessors and Implications for Operating System Design
We investigated how operating system design should be adapted for multithreaded chip multiprocessors (CMT) – a new generation of processors that exploit thread-level parallelism to mask the memory latency in modern workloads. We determined that the L2 cache is a critical shared resource on CMT and that an insufficient amount of L2 cache can undermine the ability to hide memory latency on these ...
متن کاملPower-aware Speed-up for Multithreaded Numerical Linear Algebraic Solvers on Chip Multicore Processors
With the advent of multicore chips new parallel computing metrics and models have become essential for redesigning traditional scientific application libraries tuned to a single chip. In this paper we evolve metrics specific to generalized chip multicore processors (CMP) and use them for parallel performance modeling of numerical linear algebra routines that are commonly available as shared obj...
متن کاملPrefetch Threads for Database Operations on a Simultaneous Multi-threaded Processor
Simultaneous Multi-threading (SMT) has been developed to increase instruction level parallelism by allowing instructions from a different thread to run during a stall. Inter-thread cache interference, however, might limit the benefit of running multiple independent threads. SMT processors can be utilized in a different model, where a helper thread is used to prefetch cache blocks for the main e...
متن کاملBuilding a Domain-Knowledge Guided System Software Environment to Achieve High-Performance of Multi-core Processors
Although multi-core processors have become dominant computing units in basic system platforms from laptops to supercomputers, software development for effectively running various multi-threaded applications on multi-cores has not made much progress, and effective solutions are still limited to high performance applications relying on exiting parallel computing technology. In practice, majority ...
متن کامل